Add TTFT benchmarks + update sparsity benchmarks #1140

jcaip · 2024-10-22T19:14:10Z

This PR adds in TTFT token benchmarks to torchAO, and also updates the benchmarking script to handle sparsity a bit nicer + use the 2:4 sparse checkpoints that are available.

Additionally also adds in padding support for int8 dynamic quant + 2:4 sparsity, which we were missing before.

Summary: This PR adds in a sparsity option to the LLaMa benchmarks. Test Plan: Reviewers: Subscribers: Tasks: Tags:

pytorch-bot · 2024-10-22T19:14:14Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1140

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit de2d447 with merge base 2f97b09 ():

NEW FAILURE - The following job has failed:

Code Analysis with Ruff / build (3.9) (gh)
##[error]Process completed with exit code 1.

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Run Regression Tests / test-nightly (CPU Nightly, linux.4xlarge, --pre torch --index-url https://download.pytorch.org/wh... / linux-job (gh) (trunk failure)
test/prototype/test_parametrization.py::TestFakeSparsity::test_jit_trace

This comment was automatically generated by Dr. CI and updates every 15 minutes.

HDCharles · 2024-10-29T07:19:52Z

torchao/_models/llama/generate.py

-            from torchao.dtypes import MarlinSparseLayout
-            quantize_(model, int4_weight_only(layout=MarlinSparseLayout()))
+            if sparsity and "semi" in sparsity:
+                quantize_(model, int4_weight_only(layout=MarlinSparseLayout()))


this isn't using any of the derived variables. It should use the derived ones or be in a separate section.

HDCharles

lgtm if you move the marlin stuff so its clearer what derived variables it actually uses

HDCharles · 2024-11-07T20:42:57Z

torchao/_models/llama/generate.py

    print(f"Peak Memory Usage: {mem:.02f} GB")
    print(f"Model Size: {model_size:.02f} GB")
    if write_result:
-        result_txt = f"\n{datetime.today().strftime('%Y%m%d%H%M%S')}, tok/s={tokpersec:6.2f}, mem/s={bandwidth:7.2f} GB/s, peak_mem={mem:5.2f} GB, model_size={model_size:5.2f} GB "
-        result_txt += f"quant: {quantization}, mod: {checkpoint_path.parent.name}, kv_quant: {kv_cache_quantization}, compile: {compile}, compile_prefill: {compile_prefill}, dtype: {precision}, device: {device} "
+        result_txt = f"\n{datetime.today().strftime('%Y%m%d%H%M%S')}, tok/s={tokpersec:6.2f}, mem/s={bandwidth:7.2f} GB/s, time={t:5.4f} sec, peak_mem={mem:5.2f} GB, model_size={model_size:5.2f} GB "


time is a really generic term, is this TTFT or overall run? the tok/s info is already the non prefill indicator so TTFT or time to do prefill is probably more valuable.

It's overall time, but I limit num_tokens to 1. I can make this a bit clearer though, maybe a --ttft flag that sets forces num_tokens_generated to be 1.

This PR adds in TTFT token benchmarks to torchAO, and also updates the benchmarking script to handle sparsity a bit nicer + use the 2:4 sparse checkpoints that are available. Additionally also adds in padding support for int8 dynamic quant + 2:4 sparsity, which we were missing before.

zhyncs · 2024-12-08T10:46:58Z

Hi @vkuzo Thanks for the great work!
I see that GitHub has already released 0.7.0, but PyPI is still at 0.6.1. When is the update for PyPI expected? Thanks!
https://github.com/pytorch/ao/releases/tag/v0.7.0

* Torchchat CLI pipeline for Multimodal Models * Remove torchaudio check; we don't use it * Flip the imports back for ET --------- Co-authored-by: vmpuri <[email protected]> Co-authored-by: Jack-Khuu <[email protected]>

vkuzo · 2024-12-11T14:03:12Z

Hi @vkuzo Thanks for the great work!

I see that GitHub has already released 0.7.0, but PyPI is still at 0.6.1. When is the update for PyPI expected? Thanks!

https://github.com/pytorch/ao/releases/tag/v0.7.0

It's available as of last night!

zhyncs · 2024-12-11T14:09:47Z

Thanks!

jcaip added 6 commits October 18, 2024 11:05

Add sparsity flag to benchmark

f390fd9

Summary: This PR adds in a sparsity option to the LLaMa benchmarks. Test Plan: Reviewers: Subscribers: Tasks: Tags:

update

67937a9

update

6b62266

fp8 testing

aa4c9df

fp8 testing

6b1ede1

wip

3c07c40

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 22, 2024

jcaip added 22 commits October 22, 2024 12:22

update benchmark script

a6c7de9

update

3660766

wip

ddf2e10

udpate

ad4d3b0

update

653587e

wip

c757357

wip

f1b0841

test

afeaff5

wip

c294765

update

803e9b3

fix

eb18850

wip

2642212

move out of aqt

4eccdb9

wip

13e6fd6

moved float8+24 to it's own file

608d70c

Merge branch 'main' into jcaip/sparse-benchmarking-updates

b1f1796

update

30a4fac

wip

6091592

remove float8 for now

17f9121

wip

75d0a0b

fix

b2fba99

fix

ba5665d

jcaip marked this pull request as ready for review October 28, 2024 18:27

jcaip changed the title ~~[wip] add ttft benchmarks + update sparsity benchmarks~~ Add TTFT benchmarks + update sparsity benchmarks Oct 28, 2024

jcaip requested a review from HDCharles October 28, 2024 18:28

HDCharles reviewed Oct 29, 2024

View reviewed changes

HDCharles approved these changes Oct 29, 2024

View reviewed changes

HDCharles reviewed Nov 7, 2024

View reviewed changes

time prefill by default

4fdfa7b

jcaip force-pushed the jcaip/sparse-benchmarking-updates branch from 6858180 to 4fdfa7b Compare December 2, 2024 21:18

jcaip added 12 commits December 2, 2024 16:51

update

111babc

merge

35f1fc7

fix merge conflicts

23f981d

update

74c52ff

update benchmarks

eed072d

fix ruff check

67cbcbb

fix ruff v2

0e579ae

undo change

443db19

add padding

054717e

update import

2e5b72a

final commit

2b81dd6

fix script

de2d447

jcaip added benchmark topic: new feature Use this tag if this PR adds a new feature topic: performance Use this tag if this PR improves the performance of a feature labels Dec 3, 2024

jcaip merged commit 1a0dbf1 into main Dec 4, 2024
17 of 21 checks passed

jcaip mentioned this pull request Dec 26, 2024

[RFC] Sparsity Future Plans #1136

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TTFT benchmarks + update sparsity benchmarks #1140

Add TTFT benchmarks + update sparsity benchmarks #1140

jcaip commented Oct 22, 2024 •

edited

Loading

pytorch-bot bot commented Oct 22, 2024 •

edited

Loading

HDCharles Oct 29, 2024

HDCharles left a comment

HDCharles Nov 7, 2024

jcaip Nov 7, 2024

zhyncs commented Dec 8, 2024

vkuzo commented Dec 11, 2024

zhyncs commented Dec 11, 2024

Add TTFT benchmarks + update sparsity benchmarks #1140

Add TTFT benchmarks + update sparsity benchmarks #1140

Conversation

jcaip commented Oct 22, 2024 • edited Loading

pytorch-bot bot commented Oct 22, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1140

❌ 1 New Failure, 1 Unrelated Failure

HDCharles Oct 29, 2024

Choose a reason for hiding this comment

HDCharles left a comment

Choose a reason for hiding this comment

HDCharles Nov 7, 2024

Choose a reason for hiding this comment

jcaip Nov 7, 2024

Choose a reason for hiding this comment

zhyncs commented Dec 8, 2024

vkuzo commented Dec 11, 2024

zhyncs commented Dec 11, 2024

jcaip commented Oct 22, 2024 •

edited

Loading

pytorch-bot bot commented Oct 22, 2024 •

edited

Loading